The module PDB defines types and methods to work with protein structures inside Julia. It is useful to link structural and sequential information, and needed for measure the predictive performance at protein contact prediction of mutual information scores.
using MIToS.PDB
# Truncate IJulia outputs at:
ENV["LINES"] = 15
ENV["COLUMNS"] = 60;
This module exports the downloadpdb function, to retrieve a PDB file from PDB database. This function downloads a gzipped PDBML ("xml") file, which could be easily read it with MIToS by default, but you are able to determine the format as "pdb" if you want it.
pdbfile = downloadpdb("1IVO", format="pdb")
PDB module also exports a getpdbdescription to access the header information of a PDB entry.
getpdbdescription("1IVO")
This is easy using the read and parse functions, indicating the filename and the Format: PDBML for PDB "xml" files or PDBFile for usual "pdb" files. This functions returns a Vector of PDBResidue objects with all the residues in the PDB.
To return only a specific subset of residues/atoms you can use any of the following keyword arguments:
| keyword arguments | default | returns only ... |
|---|---|---|
chain |
"all" |
residues from a PDB chain, i.e. "A" |
model |
"all" |
residues from a determined model, i.e. "1" |
group |
"all" |
residues from a group: "ATOM", "HETATM" or "all" for both |
atomname |
"all" |
atoms with a specific name, i.e. "CA" |
onlyheavy |
false |
heavy atoms (not hydrogens) if it's true |
# Read α carbon of each residue from the 1ivo pdb file, in the model 1, chain A and in the ATOM group.
CA_1ivo = read(pdbfile, PDBFile, model="1", chain="A", group="ATOM", atomname="CA")
CA_1ivo[1] # First residue. It has only the α carbon.
MIToS parse PDB files to vector of residues, instead of using a hierarchical structure like other pages. This approach makes the search and selection of residues or atoms a little different. To make it easy, this module exports a number of functions and macros to find, select, collect particular residues or atoms.
Given the fact that residue numbers from different chains, models, etc. can collide, it's mandatory to indicate the model, chain, group, residue number and atom name in a explicit way to this functions or macros.
If you want to select all the residues in one of the categories, you are able to use the wildcard "*". You can also use regular expressions or functions to make the selections.
res_1ivo = read(pdbfile, PDBFile)
println("res_1ivo has ", length(res_1ivo), " residues.")
Dict of PDBResidues¶If you prefer a Dict of PDBResidue, indexed by their residue numbers, you can use the residuedict function or the @residuedict macro.
# Dict of residues from the model 1, chain A and from the ATOM group
residuesdict(res_1ivo, "1", "A", "ATOM", "*")
@residuesdict res_1ivo model "1" chain "A" group "ATOM" residue "*"
Use the residues function to collect specific residues.
It's possible to use a single residue number (i.e. "2"), a list or set of residue numbers (first example) or even a function which should return true for the selected residue numbers (last example). Also regular expressions can be used to select residues. Use "*" to select all the residues.
residue_list = map(string, 2:5)
# If the list is large, you can use a `Set` to gain performance
# residue_set = Set(map(string, 2:5))
first_res = residues(res_1ivo, "1", "A", "ATOM", residue_list)
for res in first_res
println(res.id.name, " ", res.id.number)
end
# Select all the residues of the model 1, chain A of the ATOM group with residue number less than 5
first_res = residues(res_1ivo, "1", "A", "ATOM", x -> parse(Int, match(r"^(\d+)", x)[1]) <= 5 )
# The anonymous function takes the residue number (string) and use a regular expression
# to extract the number (without insertion code).
# It converts the number to `Int` to test if the it is `<= 5`.
for res in first_res
println(res.id.name, " ", res.id.number)
end
Use the @residues macro for a cleaner syntax.
# You can use "*", regular expressions, lists, sets or functions also for model, chain and group:
# i.e. Takes the residue 10 from chains A and B
for res in @residues res_1ivo model "1" chain ["A","B"] group "ATOM" residue "10"
println(res.id.chain, " ", res.id.name, " ", res.id.number)
end
The atoms function or macro allow to select a particular set of …. .
# Select all the atoms with name starting with "C" using a regular expression
# from all the residues of the model 1, chain A of the ATOM group
carbons = @atoms res_1ivo model "1" chain "A" group "ATOM" residue "*" atom r"C.+"
carbons[1]
atoms(res_1ivo, "1", "A", "ATOM", "*", r"C.+")[1]
The PDB module offers a number of functions to measure distances between atoms or residues, to detect possible interactions or contacts. In particular the contact function calls the distance function using a threshold or limit. The measure can be done between alpha carbons ("CA"), beta carbons ("CB") (alpha carbon for glycine), any heavy atom ("Heavy") or any ("All") atom of the residues.
In the following example, whe are going to plot a contact map for the 1ivo chain A. Two residues will be considered in contact if their β carbons (α carbon for glycine) have a distance of 8Å or less.
pdb = @residues res_1ivo model "1" chain "A" group "ATOM" residue "*"
dmap = distance(pdb, criteria="CB") # Distance between residues
cmap = contact(pdb, 8.0, criteria="CB") # Contact map
# Pkg.add("Plots") # to install Plots
# Pkg.add("PlotlyJS") # to install PlotlyJS to use it as Plot backend
using Plots
plotlyjs()
heatmap(dmap, grid=false)
heatmap(convert(Matrix{Int},cmap), grid=false)
# Pkg.add("Plots") # to install Plots
# Pkg.add("GR") # to install GR to use it as Plot backend
using Plots
gr()
pdbfile = downloadpdb("2HHB")
res_2hhb = read(pdbfile, PDBML)
chain_A = pdb = @residues res_2hhb model "1" chain "A" group "ATOM" residue "*"
chain_C = pdb = @residues res_2hhb model "1" chain "C" group "ATOM" residue "*"
scatter3d(chain_A, label="A", alpha=0.5)
scatter3d!(chain_C, label="C", alpha=0.5)
superimposed_A, superimposed_C, RMSD = superimpose(chain_A, chain_C)
RMSD
scatter3d(superimposed_A, label="A", alpha=0.5)
scatter3d!(superimposed_C, label="C", alpha=0.5)